Two-Stage Temporal Processing for Single-Channel Speech Enhancement
نویسندگان
چکیده
Most of the conventional speech enhancement methods operating in the spectral domain often suffer from spurious artifact called musical noise. Moreover, these methods also incur an extra overhead time for noise power spectral density estimation. In this paper, a speech enhancement framework is proposed by cascading two temporal processing stages. The first stage performs excitation source based temporal processing that involves identifying and boosting the excitation source based speechspecific features present at the gross and fine temporal levels, whereas the second stage provides noise reduction by estimating standard deviation of noise in time-domain by using a robust estimator. The proposed noise reduction stage is quite simply implementable and computationally less complex as it does not require noise estimation in spectral domain as a pre-processing phase. The experimental results have established that the proposed scheme produces on an average 60-65 % improvement in the speech quality (PESQ scores) and intelligibility (STOI scores) at 0 and -5 dB input SNR when compared to existing standard approaches.
منابع مشابه
A Novel Frequency Domain Linearly Constrained Minimum Variance Filter for Speech Enhancement
A reliable speech enhancement method is important for speech applications as a pre-processing step to improve their overall performance. In this paper, we propose a novel frequency domain method for single channel speech enhancement. Conventional frequency domain methods usually neglect the correlation between neighboring time-frequency components of the signals. In the proposed method, we take...
متن کاملApplication of Blind Source Separation in Speech Processing for Combined Interference Removal and Robust Speaker Detection Using a Two-microphone Setup
A speech enhancement scheme is presented integrating spatial and temporal signal processing methods for blind denoising in non stationary noise environments. In a first stage, spatially localized interferring point sources are separated from noisy speech signals recorded by two microphones using a Blind Source Separation (BSS) algorithm assuming no a priori knowledge about the sources involved....
متن کاملRobust Asr in Reverberant Environments Using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction
This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum ...
متن کاملThe 2nd ‘chime’ Speech Separation and Recognition Challenge: Approaches on Single-channel Source Separation and Model-driven Speech Enhancement
In this paper, we address the small vocabulary track (track 1) described in the CHiME 2 challenge dedicated to recognize utterances of a target speaker with small head movements. The utterances are recorded in a reverberant room acoustics corrupted with highly non-stationary noise sources. Such adverse noise scenario imposes a challenge to state-of-the-art automatic speech recognition systems. ...
متن کاملSingle-Microphone Speech Enhancement Inspired by Auditory System
Title of dissertation: Single-Microphone Speech Enhancement Inspired by Auditory System Majid Mirbagheri, Doctor of Philosophy, 2014 Dissertation directed by: Professor Shihab Shamma, Department of Electrical and Computer Enhancing quality of speech in noisy environments has been an active area of research due to the abundance of applications dealing with human voice and dependence of their per...
متن کامل